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\ Overview 


PowerVR PCX2 is a member of the PowerVR 3D processor family of devices 
that specifically targets the Personal Computer (PC) environment. It is a 
highly integrated device that combines the PowerVR ISP (Image Synthesis 
Processor), the PowerVR TSP (Texture Shading Processor), a PCI Interface 
and a synchronous DRAM (SDRAM) controller all on one chip. PCX2 is fully 
pin and software compatible with it’s predecessor PCX1. 


Texture and Parameter 
SDRAM 


i 


SDRAM Controller 
LL [ 
Image Texture 
Synthesis Shading 
Processor Processor 
(ISP) (TSP) 


Parameters | | 3D Rendered 
and Textures PCI Interface Image 


ee) a od 


Figure 1 - PowerVR PCX2 Simplified Block Diagram 


The ISP subsystem performs hidden surface removal on a list of 3D objects. It 
outputs, for each pixel in the image, the ID of the visible surface at that pixel. 
This hidden surface removal which effectively equates to conventional z- 
buffering is carried out on-chip. As a result PowerVR PCX2 does not require 
a z-buffer memory, however it does provide the functionality equivalent to a 
32-bit accurate z-buffer. The ISP subsystem also performs shadow generation 
which indicates whether the visible surface at each pixel is in shadow or not. 


The TSP subsystem operates on the visible surface ID (Tag) produced by ISP 
to correctly texture map and shade the surface for display. The TSP supports 
perspective correct texturing, smooth shading, multi-level transparency, and 
fogging. Since the TSP need only process the visible surfaces it minimises the 
accesses to the external SDRAM which stores textures. Both ISP and TSP 
subsystems have built-in caches to ensure highly efficient and localised access 
to rendering parameters. The PowerVR PCX2 PCI interface has been 
designed to offer high performance master and slave capabilities. PowerVR 
PCX2 accesses the ISP related parameters as a PCI master from the system 
memory. It receives the TSP parameters and textures as a PCI slave. The PCI 
interface has adequate buffering to ensure high speed data transfers and to 
cope with system latencies. 


The SDRAM arbiter and controller allows the various subsystems, including 
ISP, TSP and PCI Interface to share and access the external SDRAM in an 
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optimal manner. This interface has been designed to operate with the cost 
effective SDRAM-Lite specification. As shown in Figure 1, PCX2 obtains its 
rendering parameters over PCI and delivers a fully textured, shaded 3D image 
to a PCI address. In the normal operational mode the destination address for 
the output image is the PC’s 2D graphics (VGA controller) frame buffer. 


2. Feature Summary 
* denotes new features added since PCX 1 


2.1 PCX1 compatibility 


e Pin compatible with PCX1 
e Backward compatibility with PCX1 


2.2 PCI Interface 


e PCI-66 operation* 
e High performance master and slave operation. 
e PCI 2.1 compatible. 


2.3 ISP Subsystem 


e On-chip hidden surface removal (no need for z-buffer memory). 
e Fully correct pixel accurate hidden surface removal. 

e 32-bit depth precision. 

e True shadow generation. 

Per pixel fogging. 

Floating point setup.* 

hardware parameter clipping.* 

32 processor elements. 


2.4 TSP Subsystem 


2.4.1 Texturing 


e pixel per clock peak texturing rate. 

e Perspective correct (division per pixel). 
e Bilinear texture interpolation.* 

e MIP mapped anti-aliasing. 

e 16/8 bit textures. 

e Translucent textures. 

e 1,2 or4MB texture memory support. 


2.4.2 Shading 


e pixel per clock peak smooth shading rate. 
e Smooth shadows. 
e Flat shading with offset ‘highlight’. 
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Effects 


Real time full shadows 

Advanced lighting including search light. 
Multiple levels of translucency. 

Global and translucent textures. 
Exponential colored fog function. 


Output Image 


e 16 bit format support. 


e RGB 565 packed. 

e RGB 555 packed. 

e BGR as above. 

e 24 bit dithered or truncated to 16 bit. 


24 bit format support. 
e RGB 888 packed and unpacked. 
e BGR 888 packed and unpacked. 


Little or big endian pixel formats. 
Masked writing for image compositing.* 
Up to 1024 x 1024 rendered image resolution. 


Addressable destination image in the PCI space. 
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3. Architectural Details 


3.1 PowerVR PCX2 Overall Operation and Description 


The detailed block diagram of the PowerVR PCX2, showing all the key 
subsystems is shown in Figure 2. 


SDRAM Interface 
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SDRAM Interface Control 
Memory Arbiter 
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ISP Parameter 
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Management 
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Texture & Shading Unit 


Pre-calculation 


Iteration Pipelines 


Parameter 
Cache 
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Pixel Combiner 


Accumulation Buffer 


Data PCI Control PCI Image Buffer 
Buffer 1kbyte 
256 Bytes 


PCI Interface 


Figure 2 - PowerVR PCX2 Internal Diagram 


PowerVR PCX2 acts as a PCI master when: 


e Accessing ISP parameters from systems main memory. 
e Transferring the rendered image over PCI to the system’s graphics 
memory. 


The ISP subsystem incorporates a 12KB on-chip cache which allows rendering 
parameters such as plane and polygon data to be loaded directly. This cache 
can be operated in a double buffered mode to allow parameter loading and 
rendering operation to be overlapped. 


PowerVR PCX2 acts as a PCI slave for: 


e Texture memory access. 
e TSP parameter loading. 
e Register access. 
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The TSP parameters are preloaded in the external SDRAM which also stores 
textures. The TSP subsystem incorporates a 4KB parameter cache which 
fetches these parameters on a cache miss. The TSP parameters are normally 
double buffered in the external SDRAM thus allowing for the overlapping of 
parameter loading and actual rendering. 


The ISP divides the screen into segments (tiles) and operates on them one at a 
time. This is shown in Figure 3. 


X 


$$ 


Figure 3 - Screen Tiling 


The tiles are N x M pixels with N being a multiple of 32 and M being an 
integer greater than 1. Typical tile sizes include, 64 x 64, 64 x 32, 32 x 32. 
There are many advantages to this tiling approach. As shown in Figure 4 
dividing the scene to be rendered into regions allows objects in the scene to be 
associated with the tiles. This means that certain tiles, as depicted in Figure 4, 
will have little or no complexity associated with them (sky, field) while some 
tiles can contain substantial details (helicopter, trucks). This approach allows 
for the processing power to be concentrated on where it is needed. It also 


ee ee LS 


Figure 4 - Advantages of Using Tiling 


The ISP loads the parameters associated with the objects in each tile in turn 
into its on-chip cache. It then uses these parameters to perform hidden surface 
removal on the associated planes/polygons. As shown in Figure 5, for every 
“pixel” in a tile on the viewport, the ISP performs a ray casting operation 
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which determines, among all the surfaces associated with that tile the surface 
that is closest (and therefore visible) to the viewer. 


Object 


Viewport 


Figure 5 - ISP’s Hidden Surface Removal 


ISP operates on infinite planes. These can be used to represent conventional 
polygon meshes as well as convex objects made up of such infinite planes. 


Each surface is represented by 3 parameters (A, B and C) where for a point p 
(x,y) on the viewport the distance of the corresponding point on the surface is: 


2 


Depth = a oo = Ax+By+C (Eqn la) 
Distance 


For the special case of a perpendicular plane the above equation becomes: 


+MAX if Ax + By + C20 
Depth = (Eqn 1b) 
—MAX if Ax+ By+C <0 
where x and y are co-ordinates on the viewport. 
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Figure 6 (a) shows convex objects bounded by infinite planes. 


(a) Intersecting convex objects (b) Concave object broken 
down into two convex objects 


(c) A triangle or polygon made up from bounding an infinite surface with 
invisible planes 


Figure 6 - ISP’s Object Primitives 


Concave objects made up of infinite planes are broken down into convex 
objects (Figure 6 (b)). Triangles and polygons are formed simply by bounding 
a planar surface with perpendicular surfaces (Figure 6 (c)). Objects made up 
of such triangles/polygons need not be convex. 


The ISP subsystem contains 33 processing elements (PE’s) that operate on 32 
horizontally adjacent pixels. The pre-calculation unit calculates the initial 
depth while each PE calculates the correct depth for the pixel it is operating 
on. 


Per pixel the ISP subsystem outputs the ID of the visible surface (Surface Tag) 
at that point. If there are any shadow lights in the scene it also identifies the 
pixels which are in shadow and marks them accordingly. The depth accuracy 
of ISP is 32 bits which is used to produce an 8 bit logarithmic depth value 
passed to the TSP. 


The TSP uses the surface ID tag, shadow tag, and the depth information to 
carry out texturing, shading, shadowing and fogging. It also supports 
multi-level translucency which can be either plane/polygon wide (global) or be 
associated with individual texels (local). 


The surface tags, output by ISP for a given block, are run length coded as 
[surface tag : span length] and stored in a span fifo. The other input parameters 
are stored in holding fifos on a per pixel basis. 


Spans are compared against the TSP’s parameter cache. Since the tags are 
input in a tiled format vertical as well as horizontal coherency of a given scene 
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can be exploited by the cache. The TSP cache size is 4KB which corresponds 
to 128 average complexity surface parameter lists. A 2-way set-associative 
cache architecture is implemented. If the cache hits, a set of surface parameters 
corresponding to that tag is passed directly to the pre-calculation unit. If the 
cache misses, the TSP parameter list is fetched from the external SDRAM, 
stored and forwarded to the pre-calculation unit. Since the cache is designed 
for a high hit rate, any subsequent miss penalty on the SDRAM bus bandwidth 
is small. 


The texture and shading unit consists of pre-calculation followed by iteration 
stages. 
The equations for perspective texture mapping are: 
+ by + dx +ey+ 
_ axt by Cee eyt+ f (Eqn 2) 
px+qyt+r px+qyt+r 


where: ‘u’ and ‘v’ are the 2D co-ordinates in texture space 
‘x’ and ‘y’ are the 2D co-ordinates in screen space 
‘a’, ‘b’, ‘c’, “d’, ‘e’, and ‘f’ are coefficients used in the 
mapping process 


The equations for linear smooth shading where the intensity function is 
changed linearly with the ‘x’ and ‘y’ co-ordinates are: 


l = qT; Xlocal 7 T: Mieeat + To (Eqn >) 


where: “Xtocal, 2Nd “Yiocq)’ are the pixel co-ordinates relative to the 


centre of the surface being used. 


‘To’, ‘T,’ and ‘T.’ are shading constants. 


Both texture and shading calculations can therefore be broken down into a pre- 
calculation and an iteration step as ‘x’ is incremented, i.e. 


ax+by+c  a(x,y) a(x,y)+a 


u(x, y) = Serene = Rea’ u(x +1, j= B(x, wep (Eqn 4) 
dx+eyt+f — a(x,y) a(x,y)+d 

Nl heer Man Agee (Eqn 5) 

Ux,y)= Txt Ti yt+T, >. Uxtly) =x, yl+ T, (Eqn 6) 


The pre-calculation unit takes the X,Y address of the current span, along with 
the current parameter list and calculates initial values for the texture and 
shading iteration units as specified in the parameter list control word. It 
consists of a multiply accumulate array controlled by a microsequencer. 


A pre-calculation buffer allows cache miss penalties and pre-calculation 
overheads to overlap the texture and shading iteration process, thus increasing 
system throughput. 
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The texture iteration pipeline performs the hyperbolic interpolation in u, v and 
the MIP map D calculation. The texture pipeline can output two addresses per 
clock cycle, one for each of the two appropriate MIP maps. A small on-chip 
texture cache ensures a performance of almost one MIP mapped texturing 
operation per clock cycle to be achieved. The texture maps are arranged so as 
to minimise page breaks between consecutive map accesses. 


The shading iteration pipeline and shadow iteration pipeline allow 
simultaneous evaluation of a global lighting intensity and a shadow light 
intensity for smooth shaded surfaces. The shading pipelines output intensity 
values which are forwarded to the pixel combiner. 


Fog values are forwarded to the pixel combiner where they are used to 
interpolate between the calculated pixel color and a fog color. 


The pixel combiner implements the final R,G,B color evaluation of each 
screen pixel. A base color or texture color is input and multiplied by the 
conditional sum of shadow and shading intensity values from the shading 
pipelines as dictated by the shadow bits. Any highlight offsets are also 
summed in. The pixels are then fogged and transferred to the accumulation 
buffer. 


The accumulation buffer allows translucent objects to be rendered using a 
multi-pass technique. First the opaque surfaces in a 32 pixel block are 
processed and transferred to the accumulation buffer as described above. The 
translucent surfaces are then processed in a similar manner, with the exception 
that all translucent surfaces will have an ‘alpha’ component as part of their 
parameter list or texture data which is used to mix between the current pixel 
and the corresponding background pixel stored in the accumulation buffer (the 
mix is only performed if the translucent pixel is in front of the pixel stored in 
the accumulation buffer). When all translucent pixels have been processed the 
contents of the accumulation buffer are forwarded to the pixel formatter. 


The pixel formatter converts the final output of the accumulation buffer to one 
of the many 16 or 24 pixel formats. The selected format is chosen to be 
compatible to the destination frame buffer. 


The formatted pixel packet is queued in the 1KB PCI image buffer which is 
later transferred over PCI to the frame buffer. 


3.2 ISP Subsystem 


The ISP subsystem performs the hidden surface removal and shadow 
generation, manages ISP parameter loading and the associated on-chip ISP 
parameter cache. It consists of three main components: 


A. The ISP processing engine that consists of a precalculation unit and an 
array of processing elements (PE’s). 
Z. The ISP parameter management unit. 
Bs The ISP parameter cache. 
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3.2.1 ISP Processing Unit 


Pre-calculation and Processing Arra 


The ISP processing unit consists of a pre-calculation unit that calculates the 
initial depth. The processing array contains 32 processing elements (PE’s) that 
perform the hidden surface removal for 32 adjacent pixels. The ISP 
plane/polygon parameters are generally cached into the on-chip memory and 
are used by the processing engine in the hidden surface removal operation. 


Associated with each surface is an 18-bit identifier which is passed to the TSP 
so that it can fetch the appropriate shading and texture parameters. There is a 
4-bit instruction which is used by the PE’s. The instruction identifies the type 
of surface to be processed, typical surface attributes are forward/reverse, and 
visible/invisible. A complete list of all the instructions used by the PE’s is 
given in the following table. 


| 
A 
1 
4 

E 
F 


Table 1 - PE Instructions 


Instruction opcode 


forw_visib 


forw_invis 


forw_perp 

forw_visib_fp 
forw_invis_fp 
forw_perp_fp 


rev_visib 
rev_invis 
rev_replace_if 


test_shad_forw 


test_shad_perp 


test_light_rev 


begin_trans 


After all the surfaces have been processed each PE contains the surface 
identifier, 32-bit depth value, and shadow flag, for the closest visible surface. 
The 32-bit depth value is converted to an 8-bit fog value which is used in the 
TSP to blend the fog color with the actual color of the visible surface. This 
conversion is performed in the fog module. 
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Foggin 


The reason that a conversion is necessary is that in fog, light is attenuated a 
percentage per meter. For example, if the light is attenuated by 1% per meter, 
in | meter the light will be 0.99 of what it was. In 2 meters, the light will be 
0.99*0.99=0.997=0.98 of the original. In 'n' meters, the light will be 0.99” of 
the original. 


0.99" = zr log,o.%) 
As each color is only 8Bits, the attenuation factor only requires to be 8 Bits as 
well. So the resulting calculation is:- 


n.log. (0.99) 


INT(2 * 256) 


Where 0 is complete fog and 256 is no fog. 


The fog module performs an approximation to the above function using a 
look-up table and it also allows the amount of fog in a scene to be set. 


ISP/TSP Interface Module 


This module provides buffering and handshaking between the ISP and the 
TSP. The buffer in the output module allows 64 pixels to be stored. Each 
packet of 32 pixels has the X,Y location of packet appended to the front of the 
packet. Once the fifo is full the ISP will continue processing surfaces until it is 
ready to output a further pixel packet, at this point it will stall until there is 
room for another pixel packet. 


3.2.2 ISP Parameter Management Unit 


Rather than process every object for every pixel, the image plane is partitioned 
into a number of tiles. For a given tile only those objects which are within that 
tile are processed. For PCX2 to know which objects are in which tile there is a 
second list which describes for each tile the position and size of the tile and the 
objects within it. Both lists are held in the PC’s main memory. 
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Object Parameters 


Object Z 


parameters 


Object X 


parameters. 


Object Pointer List 


Tile Identifier 
Pointer to Object X 
Pointer to Object Y Object W 


parameters 


Tile Identifier 
Pointer to Object W ry 


Pointer to Object Z 
Tile Identifier 


Object Y 
parameters. 


Figure 7 - Object Pointer List and Parameter List 


The first list is the object pointer list which is a linked list with each entry 
being a single 32-bit word, the start address of the list is given by a register in 
PCX2. Each entry in the list can be one of the following forms; a tile 
identifier, an object pointer, or a link pointer. The tile identifier gives the size 
and position of the tile. The object pointer gives the location and the number 
of planes of an object to be rendered. The link pointer allows the object pointer 
list to be composed of a number of linked tables. A tile identifier entry is 
always followed by a number of object pointer entries. The list has to start 
with a tile identifier entry. A single bit in each 32-bit word is used to indicate 
to PCX2 that the current entry is to be interpreted as tile identifier or an object 
pointer. The format of the 32-bit word in each case is shown below: 


Tile Identifier 


19... 15 
| 0 | 1 | Tile Y Position | Tile X Position Tile Y Size Tile X Size 
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Object Pointer 
28 .. 19 


Last Object of Number of Planes in Address of Object 
Last Tile Flag object 


Object Pointer Link 


28 .. 19 
x Address of next 
Object Pointer 
e Tile X, Y Sizes 


The tile Y size specifies the height of the tile. For a tile of height N a value of 
N-1 should be used in this field. 


The tile X size specifies the tile width in multiple of 32 pixels. For a tile width 
of N*32, this field should be set to N-1. 


e Tile X, Y Positions 


These specify the top left hand corner pixel of a particular tile. The tile X 
position is expressed as a multiple of 32 while the tile Y position is in pixel 
numbers. 


e Address of Object 


Specifies the relative start address of the parameters associated with an object 
multiplied by the number of words in a plane, i.e. 3, or 4 if floating point set- 
up is used. This is fed through the page look up table to obtain the address in 
the PC’s main memory. 


e Address of next object pointer 


This is a link to the start of the next object pointer table. It is a relative address 
multiplied by the number of words in a plane, i.e. 3, or 4 if floating point set- 
up is used. This is fed through the page look up table to obtain the address in 
the PC’s main memory. 


e Number of Planes 


Indicates the number of planes in the object associated with the object pointer. 
Larger objects are specified by using sub-objects and multiple object pointers. 


e Last Object of Last Tile Flag 


This flag indicates the end of all the region data. It is set to 1 for the last 
object of the last region. 
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3.2.3 ISP Parameters and their Caching 


A tile is rendered by reading the plane parameters for that tile into an on-chip, 
1024 x 96bit, Parameter Cache, then processing the parameters in this cache. 
To improve performance of the ISP, a double-buffering technique is employed 
with this cache such that it is split into 2 x (512 x 96bit) caches. The caches 
are operated independently so that as one is being processed, the other is 
loaded with new tile parameters. 


The parameter management unit controls the requesting of parameters, and 
their subsequent writing to the caches. It also controls the double buffering of 
the caches and communicates with the ISP processing block giving details of 
which cache is complete and contains valid parameter data. Parameters are 
loaded by issuing burst requests to the PCI Interface. The subsequent 
parameter data is directed by the parameter management unit into the 
parameter cache. 


Since a complete tile is processed at any given time, to allow this double- 
buffering to operate, the tile parameters must reside completely within one of 
the two caches (i.e. the tile must have less than 513 planes). If a tile requires 
more parameters than will fit into a single cache, the parameter management 
unit automatically switches to a single buffer mode. In this case, the cache is 
reconfigured as a single, 1024 x 96bit, cache. PCX2 switches back to double- 
buffered mode as soon as a tile is encountered with less than 513 planes. 


If a tile consists of greater than 1024 plane parameters then the remaining 
parameters are loaded into an area of local SDRAM. In this case, the 
parameter management unit fills the parameter cache before directing all 
further plane parameters to the SDRAM via the SDRAM Interface. The 
parameter management unit then informs the processing block that the tile 
contains out-of-cache data. 


SDRAM Interface 


Processing 
Parameter Cache: Block 
Single 1024x96bit or 
Double 512x96-bit 


Parameter Management 
Logic 


PCI Interface 


Figure 8 - Parameter Loading modules of PCX2 


When the processing block has finished processing data in a given cache, it 
hands it back to the parameter management unit which begins to refill it with 
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parameters for the next tile. In the case of single cache mode (>512 planes in a 
tile) when the processing of the first 512 planes is finished, the first half of the 
cache is handed over to be refilled before the remaining planes (in the second 
half of the cache) have been processed. 


To further improve performance by reducing the PCI access latency for 
fetching each Object Pointer, a burst of sixteen Object Pointers are read into an 
on-chip buffer. Once the parameters for each of these sixteen Object Pointers 
have been cached, the next batch of sixteen pointers are fetched. 


As mentioned briefly above, each plane parameter consists of a 96bit value. 94 
of these bits are defined in the parameter list, whilst the remaining two are 
assigned by the ISP whilst loading the data. The 96bit parameters are stored in 
the list in a series of three 32bit words. The format of these words is shown 
below: 


31... 30 
Reserved Tag Upper Bits “A” (Sfloat) 
6 bits 20 bits 


Tag Lower Bits “B” (Sfloat) 
12 bits 20 bits 
ee? 
32 bits 


Figure 9 - Plane Parameter Format 


The three words are stored in main memory in increasing address order. 
e A,B and C are constants defined by Equation 1. 
e Cisa signed 32 bit integer. 


e A and B are 20 bit floats which have the following format: 


Mantissa is a 15 bit unsigned integer. Sign bit is used to determine the sign 
of the Mantissa. Exponent is a 4 bit unsigned value in the range 0-14. The 
tag upper and lower bits - these fields together specify an 18 bit tag which 
identifies each plane. 


e Instruction field defines the ISP instruction which indicates the type of 
surface and/or the operation to be executed. These include surface 
attributes such as forward/reverse and visible/invisible as well as shadow 
and translucency control. Details of these instructions are given later in this 
document. 
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Two restrictions are made on the plane parameter data. 
1. The maximum number of planes allowed per object is restricted to 512. 


2. There must be a minimum of 3 planes in any region. 
3.2.4 Floating point setup unit 


The Floating point setup unit supports data directly in IEEE floating point 
format and undertakes conversion of the data to the ISP’s internal parameter 
data format as described in the previous section. 


Object data is organised into 4 words per plane as shown below: 


Reserved Tag Instruction 
18 bits 4 bits 
Figure 10 - IEEE Plane Parameter Format 


The four words are stored in main memory in increasing address order. 


The FPS also undertakes scaling of the A, B & C parameters for perpendicular 
planes to ensure that the equation ax+by+c < 1 is satisfied. 


3.2.5 ISP Edge sharing 


The ISP is able to store the results of 3 processed perpendicular planes and 
reuse these for later objects. This is particularly useful when processing quads 
and strips or fans of triangles. 


Perpendicular planes in PCX1 have no use for the surface tag, and this is 
otherwise set in software to a diagnostic value. When edge sharing (enabled by 
setting bit 1 in the FLOATMODE register), the ISP uses the 2 low bits of the 
tag field in a perpendicular plane to decide which perpendicular plane of the 
previous object to re-use in the current object. The other tag bits are used to 
identify which polygon a plane belongs to in increasing count order; this 
allows the ISP to check that a request to use a previous edge is not for an 
object in another object group which would generate a spurious object. 
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Therefore in the above diagram, where each polygon was formerly described 
by an object of 4 planes (i.e. a surface plane followed by 3 perpendicular 
planes starting at the left and going clockwise), each perpendicular plane 
would now use the tag bits to identify which polygon it belonged to and if any 
edges were shared as shown below: 


P2b perp plane (reuse Ic) 
P2c perp plane 


Pointer to P4 


P4b perp plane (reuse 3c) 
P4c perp plane 


P5a perp plane 
P5b perp plane 
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The 2 lowest bits of the tag field are used as follows: 


00 don’t reuse planes 

01 reuse stored plane | 
10 reuse stored plane 2 
1] reuse stored plane 3 


21.6 


4 bits 


Figure 12 - 4th Word in Plane Descriptor when sharing edges 


3.2.6 ISP Parameters Page Look Up Table 


In PCX2, the ISP parameters are obtained from the host main memory via PCI. 
These must be accessed via physical addresses. Windows 95 and NT handle 
memory in 4KB pages. In order to allocate 1-2MB, the system software will 
have to request a large number of 4KB pages, and the chance of these being 
contiguous is low, particularly in a PC with small amount of system memory. 


To overcome this limitation, a page table is included in PCX2, which allows 
the ISP to assemble memory from smaller pieces. This page table has 128 
entries, each of which can point to a block of physical memory which is 
aligned to 4k pages. All blocks must be the same size and are either 4KB, 
8KB, or 16KB. This gives maximum memory sizes for the ISP parameters of 
512KB, 1MB, or 2MB respectively. 
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The hardware has the following architecture: 


\ ae 


Input ISP Word Pointer (19 bits) 
18.12 | 11) 10| 9 ..0 | 


Page Size Reg (16/8/4k) 


: 


: 
L 
A MWGAMMONMOAAAAAAAA VI | BSS ic 
; 


- 


¥ i Select . | 


Page Table | —_ 00, 0b... Dab 


}cseannancemmmeminsmannaneg 


128 x 12 bits | , 


eT 


sia 


o ] 
12 +2 bit Adder : 
(12 bit result) | ] 
16 Meg Base (4 bits) _ 
| | | 00 
Dhan 23.12 11.2 1,0 


Output PC Memory Byte Address 
Figure 13 - Page LUT Architecture 


e Page Size Register 

This states whether the “pages” for the ISP are 4k, 8k, or 16k. 

e 16M Base Register 

It is assumed all the “pages” will be in the same 16M of Ram, within 256M. 
e Page Table 


This contains (up to) 128 page entries of 12 bits each allowing access within 
the 16M of ram. The size of all the pages is determined by the page size 
register. Obviously these pages of memory must be locked to stop them being 
swapped out by the operating system. 
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e Select & Adder 


To allow the pages to be aligned to any 4KB boundary, the page size register 
selects either 00, bitl0, or bit] 1bit10, to be added to the relevant page table 
entry. Note that the “logical” address space of the ISP parameter memory must 
be set up accordingly, so that the separate physical pages are contiguous in 
logical memory space. 


e Output Byte Address 


This is just assembled by concatenating the 16Meg Base address, the page 
address bits from the page table, the (32bit) word offset within a page, and 
“00” to change to a byte address. 


3.3 TSP Subsystem 


3.3.1 Pre-calculation And Iteration 


Texturing and smooth shading both require evaluations of the type ax + by +c, 
i.e. multiply-accumulate. Since pixels are input in tile order, and given that 
most images have a high degree of coherency it follows that input data will fall 
into pixel spans which share the same texturing and shading information. 


For a given span, the only difference between successive pixels is the value of 
x. This allows the new value of ax + by + c to be calculated by a difference 
equation, i.e. 

f(x, y)=ax+by+c 


f(x+1y) sat )D+by+c= fly) +e 


Consequently the texturing and shading operations are split into a pre- 
calculation phase followed by an iteration phase, since the iteration phase 
requires only an accumulator. 


A further advantage is gained by placing a buffer between these units such that 
the units can operate in parallel. A buffer allows the performance on unequal 
span lengths to approach that of the ideal case. 


3.3.2 Parameter Fetching 


Each surface code points to a unique parameter block stored in system 
memory. 


The order in which the shading and texturing parameters are stored in memory 
is shown in Figure 14. 


Each set of parameters is prefixed by a control word which occupies two 32 bit 
words when stored in external memory. The control word specifies which of 
the parameter subsets (texturing, smooth shading etc.) are present. 
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+<— Low Memory 


TSP Instruction and 
Control Word 


Texture Parameters 
(Optional) 


Non Shadowed Smooth 
Shading Parameters 
(Optional) 


Shadow Light 1 
Parameters (Optional) 


Offset Color Parameters 
(Optional) 


<+——W High Memory 


Figure 14 - Surface Code Parameter List General Format 


3.3.3 TSP Instruction Description 


Each set of parameters is prefixed by a control word which occupies two 32 bit 
words when stored in external memory. The format of this is shown in Figure 
15, (the flat shading case), and Figure 16, (the smooth shading case). The 
control and texture control fields are the same for both. 


The offset x and y values supplied in the smooth shading case allow the use of 
x and y values in the shading equation which are relative to the centre of the 
polygonal facet which is being shaded. This permits the T parameters to be 
limited to 16 bit signed fixed point values. 


32 bits (Low Address) 32bits (High Address) 


Control | Texture ctrl Non Shadow Color Shadow Color 
10 bits 14 bits red | green | blue 16 bit 


Figure 15 - Instruction Format (Flat Shading) 


32 bits (Low Address) 32bits (High Address) 


Control Texture ctrl — jeserved x offset y offset 
10 bits 14 bits 8 bits (10 of 16 bits) (10 of 16 bits) 


Figure 16 - Instruction Format (Smooth Shading) 


Control Word Format 


The control word is effectively the instruction for the texturing/shading, and 
also describes how to decode the remaining parameters. It consists of a number 
of fields, which are: 


Texture | Smooth Disable | Shadow | Reserved Offset Reserved 
Shade Fog Flag Color Flag 


1 bit 1 bit 1 bit 1 bit I bit 1 bit I bit 


VideoLogic & NEC Confidential 21 22 May 1997 


NEC 


e Texture field 


The texture field indicates if the surface is textured or not. A none zero field 
indicates a textured surface. 


e Smooth Shade field 


If this field is zero, then flat shading is used - the ‘base’ and ‘offset’ colors in 
the initial block are used to either directly specify the surface color, or to flat 
shade a texture. If the field is non-zero, then linear diffuse shading is used. 


e Disable Fog field 
If this field is none zero, then no fogging is added by TSP to the pixels. 
e Shadow Flag 


This indicates whether the current surface can accept a shadow cast on it or 
not. This is used along with a shadow flag passed from the ISP to generate per 
pixel shadows. If smooth shading is enabled then the shadow generation will 
use smooth shading. 


e Offset Color 


When flat shading a surface, a ‘1’ in this field indicates if there is an offset 
color parameter block in the data. The offset color can be used to globally raise 
the intensity level of a flat shaded surface to simulate a highlight. 


Texture Control Word Format 


When textures are enabled, the texture control word governs how the texture 
parameters are interpreted. It contains the following fields. 


Exponent | Reserved | Global Reserved 
Trans 


4 bits 1 bit 4 bits 1 bit 


e Exponent 


This is the exponent part for the pseudo floating point division operation that 
is performed within the texturing pipeline. 


e Global Transparency 


These bits are combined with the texture to determine a level of translucency - 
the way this occurs depends on the texture type. This is described more fully in 
the section dealing with texture formats. 
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e Flip UV 


This determines how the repetitions of texture maps will behave. If set to flip 
in a texture dimension (either U or V), then the texture will ‘flip’ in alternate 
blocks. This is illustrated below for flipping in U. 


flipping U 


Figure 17 - Texture UV Flipping 
Setting the most significant of the two bits flips U, and the least significant 
flips V. 
e Translucency Pass 


If set to 1, this instructs TSP that this object is part of the translucency pass 
and that the texturing information will have to be mixed with the existing pixel 
color. 


3.3.4 Texture Parameters 


The texture parameter block requires 6 x 32 bit words, with the layout of the 
parameters being: 


¢———32 bit external words ———_> 
Low Memory 


High Memory 


Figure 18 - Texture Parameter Format 


* 4b,.¢,d,¢,1, p, q.r 


These are the texture mapping parameters as defined in Equation 2. 
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e D 


D 


Figure 19 - D Parameter for Mip Mapping 


For example, a value of D = | means the selection of the full resolution texture 
map, while a value of D = 2 results in the selection of the half resolution 
texture map. A non integer value for D (e.g. D = 1.5) results in a linear 
interpolation of the two adjacent texture maps either when the TSP is 
operating in MIP mapped mode or when it is operating in adaptive bilinear 
mode and D is greater than 2. D value can be calculated by: 

du’ dv’ dw’ “| 

+ 


D? = Max |— ,—+— 
c dx dy dy 


Base Pointer 


The Base Pointer gives the ‘address’ of the base of the texture map. The 
address allows access to the 16 bit word level (thus for 8 bit textures, only 
texture pixel pairs can be addressed) 


The texture parameters allocate 32 bits for the texture base pointer. The system 
uses 24 bits of texture address (corresponding to 16M 16 bit words) with the 
top 8 bits used to describe the texture type. The format of these bits is as 
follows : 


8/16 Bit | Map Size | 4444-555 | Reserved | Addr top 


Maps 
1 bit 2 bits 1 bit 


These are explained below. 


Mip Mapped 


A value of ‘1’ indicates that the texture is MIP mapped, otherwise it is not. 


8/16 Bit Maps 


This bit indicates the color resolution of the texture. If the value is ‘1’ it 
indicates that each pixel requires 16 bits, otherwise 8 bits. (Note that MIP 
mapping cannot be performed on 8 bit textures) 
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In the 16 bit mode, there are two formats - a ‘translucency’ format which is 
relevant only for the translucency phase, and an opaque which can be used for 
both. The format for the opaque form is the same as the usual 16 bit format. 


Reserved Red Green Blue ve 
1 bit Shits | Sbits | 5 bits , 


The translucency format allows multiple transparency levels (alpha blending), 
and is stored in the following format 


Alpha Red Green Blue 
4 bits 4 bits 4 bits 4 bits 


The 8 bit mode does not have the option of a pixel by pixel alpha value. 
However the texture will still be affected by the global translucency value. The 
format for each pixel is as follows 


Red Green Blue 
3 bits 3 bits 2 bits 


Alpha Value 


In the ‘translucent’ texture, the alpha value is a fractional value between 0.0 
and 1.0, and is used in the following way: 


pixel color = alpha * background color + (1.0 - alpha) shaded texture color 


where the background color is whatever there is currently in the accumulation 
buffer. An alpha value of zero therefore makes a pixel completely opaque. 


Note : The four bits of alpha are converted to unsigned fractions in the range 
i a. 14/16, 16/16 


i.e. the value 15/16 is rounded up to 16/16 to allow both completely opaque 
and completely translucent pixels to be represented 


The shaded texture color is computed from the texture pixel color and the 
current shading parameters. 


Map Size 


The map size determines the dimensions of the texture map in terms of 
number of pixels. All texture maps are square. The following bit patterns give 


map sizes: 
Bit Field Map Size 
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4444-555 


If using the 16 bit texture mode, then this indicates what format the texture is 
in. If ‘0’, it is in the opaque 555 format, if ‘1’, it is in 4444 


3.3.5 Non Shadowed Smooth Shading Parameters 


The non shadowed, or global, smooth shading parameters is stored in the 
following form 


+——37Ddit words 


+—_ Low Address 
16 bit color a) 


<+—_ High Address 


3.3.6 Shadow Light Parameters 
If flat shading, then the shadow color is stored in the control word, and so no 
extra parameters are needed. 


If smooth shading then the parameter block looks the same as the smooth 
shading parameter block. 


3.3.7 Offset Color Parameters 


The parameter block stores an offset color value for both the normal lights and 
the shadow light. The shadow light parameter is only added if the surface is 
not in shadow. 


This feature is used to add a global ‘highlight’ to flat shaded surfaces. 


<——372 dit words ——— 
Highlight Shadow HL 
16 bit color 16 bit color 


The TSP contains a 4KB parameter cache organised as 512 x 64. The cache 
block size is 4 x 64 bit words, which allows the parameter list for flat shaded, 
textured objects with shadows enabled to fit exactly into a cache block. 


3.3.8 Parameter Cache 


The cache is also designed to support non-aligned data structures, so that 
shorter and longer parameter blocks can be packed efficiently into external 
memory. Figure 20 summarises the cache data organisation. 


On a hit, the data is retrieved from a cache address generated by concatenating 
{set}, {index} and {offset} (where the value of set depends on the status of the 
tag compare, and index and offset are fields within the surface code). 


On a miss, the complete cache block is fetched from external memory from an 
address generated by concatenating {tag}, {index} and {*000’}. 
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From a system viewpoint, this address is added to the value in the parameter 
base address register (PREC_CALC @ address 2C) which allows the 
parameter database to be relocated within system memory. This scheme allows 
around 64 k typical sized parameter blocks per scene (i.e. 64k unique surfaces) 


Surface Code (18 bits) Offset 
Cache Data Structure Index 
—>. Into 
10 bits : 6 bits :2 bits Set 1 Set 2 Blocks 


64blocks .  . 64 blocks 


Block (n-1) 
512k words 
addressable Block (n) 
on double 
word Block (n+1) 
boundaries 


7 64 bits 
Le Parameter 
External Base Offset 
Memory Address Into 
Lines 


Figure 20 - Cache Data Organisation 


Figure 21 shows a block diagram of the cache architecture. 


The cache replacement logic implements a least recently used algorithm. 
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Req Ack Add Data 


Address 
= tag + index 


Index 


Parameter 
DMA 


256x64 


Data Set 1 
Addr 


DO 


DI 


Tag Set Select 


e 
Comper Data Out 


Figure 21 - Cache Block Diagram 


3.3.9 Parameter DMA 


The parameter DMA unit fetches parameters from external SDRAM in 
response to a cache miss. Due to the arbitrary alignment between external 
memory and cache, in the worst case it is possible for a parameter block to 
span 3 cache blocks. To allow efficient operation, the parameter DMA unit 
contains an instruction decoder which determines the length of the current 
parameter block by decoding the fields in the control word. (Note that the 
control word must be decoded from the cache if the first block hits or from 
memory if the first block misses). 


If more than one cache block is required, the parameter DMA can then ‘look 
ahead’ to see if the next block is in the cache by auto incrementing the index 
field of the surface code and presenting it to the tag rams. If the next block is 
present then the parameter data can be supplied seamlessly across blocks. If it 
is not present then the cache fill can be initiated with minimum overhead. 


In all cases where the cache misses the DMA unit must stall the precalculation 
unit until data becomes available. The buffer between the precalculation unit 
and the iteration pipelines lessens the impact of these stalls under certain 
conditions under many practical conditions. 
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3.3.10 Pre-calculation Unit 


The pre-calculation unit contains parallel multiply-accumulators in order to 
minimise pre-calculation latency. 


To SDRAM Interface Instruction 
Request Data Word 
Parameter Parameter Instruction 
DMA Cache Decode 
Micro 
Surface Cache Sequencer 
Code Control 
Cache Parameter Pipeline 
Address Data Control 
Interface 


Figure 22 - Block Diagram Of Parameter Cache and Pre-calculation Unit 


At peak rate the texturing and shading pipelines can generate a pixel per clock 
cycle, then under ideal conditions and with an average 7 pixels per span the 
pre-calculation time can be fully overlapped with the iteration time, 
theoretically reaching this peak. 


3.3.11 Texture Iteration Pipeline 


The Texture Iteration Unit takes the values from the Pre-calculation Unit, 
along with a span pixel count, and performs the two divisions, thus completing 
the ‘u,v’ evaluation. The value of the MIP map D parameter is also evaluated. 


The architecture is pipelined such that u,v and D values are computed on every 
clock cycle. 


3.3.12 Texture Formats 


Three texture formats are supported : 


1) 16 bit opaque textures - RGB (5:5:5) 
2) 16 bit translucent textures - alpha.RGB (4:4:4:4) 
3) 8 bit textures - RGB (3:3:2) 


Of these formats, 1 and 2 can be MIP mapped. 
Note : When MIP mapping translucent textures, the alpha value from the low 


resolution MIP map is taken. 
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3.3.13 Texture Map Arrangement in Memory 


The minimum configuration for the texture memory is 1MB. The two internal 
blocks of each device are used to provide fast page access to the two MIP map 
texture components. 


The maximum configuration for the texture memory is 4MB. 
To access a given texel in memory, a function of the form : 


Texel_ address = Texture_base_ address + Offset_ func(u,v) 


must be computed to find the texels location. Since textures co-ordinates are 
equally likely to vary in the u or v directions, an efficient method of arranging 
the texture map is to interleave the u and v index bits. 


This arrangement keeps many adjacent pixels (adjacent in both w and v) 
relatively close in the texture memory, therefore reducing the page breaks. 
Figure 23 shows the top left corner of a texture map. The numbers in each 
pixel show the offset at which the pixels are stored. 


Figure 23 - Texture Memory Pixel Alignment 


3.3.14 MIP Map Arrangement 


Since alternate levels of MIP map are accessed sequentially per pixel, it is 
important that page breaks do not occur between MIP map accesses. This is 
guaranteed by placing odd and even levels of MIP map in alternate banks of 
SDRAM, such that each bank can hold a separate row open. 


The following table shows the memory requirement for each level of MIP 
map. As can be seen, the requirement for the even levels is 1 + 16 + 256 + 4k 
+ 64k = 69905 16 bit words, whereas the requirement for the odd levels is 4 + 
64 + 1k + 16k = 17476 16 bit words. 
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Table 2 - Mip Map Memory Requirements 


In order to maintain equal capacity in each bank of SDRAM, it is necessary to 
interleave odd and even levels on a texture by texture basis. Consequently 
pairs of MIP maps are interleaved as shown in Figure 24. Texture A has its 
lowest resolution map (1x1) at the base address in bank X. The next map (2x2) 
is at the consecutive address in bank Y. Texture B has its 1x1 map at the base 
address in bank Y, and its 2x2 map at the consecutive address in bank X etc. 
This allows the same base pointer and addressing function to be used in both 
banks, and allows texture replacement to occur without too much 
fragmentation. 


Bank X 


Texture A Address 


[ ] texture A 
a texture B 


Bank Y 


Texture B Address 


Figure 24 - Mip Map Interleave 


3.3.15 Additional Features 


PCX2 supports two further MIP mapping modes; bilinear and adaptive bilinear 
MIP mapping. These both use the same format of texture parameters as 
discussed previously. 
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In bilinear texture filtering, the integer part of D (Dyn) 1s used to select a MIP 
map from which four pixels (those closest to the texture co-ordinates u and v) 
are read. Bilinear texture filtering is then applied to achieve the required 
texture value. Bilinear texture filtering can also be used in single mapped 
modes. 


In adaptive bilinear texture filtering the hardware automatically selects one of 
two modes depending upon D. If Dyyrz is 1, then bilinear texture filtering is 
applied to the top MIP map. If Dynr is equal to or greater than 2, then linear 
MIP mapping is used as described previously. This automatic selection of 
MIP mapping mode allows for filtering to be applied where it has most affect, 
in foreground surfaces, whilst performance is maintained for background 
surfaces. 


The back end functions of TSP are translucency, masking, fog, and pixel 
format conversion. 


Translucency 


Translucency is performed as a multi pass process. First the opaque surfaces in 
a 32 pixel block are processed and their RGB:888 values transferred to a 32 
word ‘accumulation buffer’. The translucent surfaces are then processed in the 
following manner. 


Two types of translucency are required. 
Translucent Textures: 


Translucent textures are used in a variety of ways. They can implement punch 
through effects, such as trees, or blends, such as clouds or smoke. Translucent 
textures are stored in 16 bit 4:4:4:4 format, which is [alpha:R:G:B]. Here the 
alpha value is effectively a fractional value between 0.0 and 1.0, and is used in 
the following way : 


pixel color = alpha * background color + (1.0 - alpha) shaded texture color 


where the background color is the corresponding pixel stored in the 
accumulation buffer. An alpha value of zero therefore makes a pixel 
completely opaque. The shaded texture color is computed from the texture 
pixel color and the current shading parameters. The pixel_color is computed 
to 24 bit accuracy. 


Note : The four bits of alpha are converted to unsigned fractions in the range 
UPTO, ZING ssexcstsuaxe 14/16, 16/16 


i.e. the value 15/16 is rounded up to 16/16 to allow both completely opaque 
and completely translucent pixels to be represented 


Global Translucency 


This is used either to create a uniform translucent surface, i.e. a pane of 
colored glass, or to allow textured and translucent textured objects to be ‘faded 
out’. 


Global translucency is a 4 bit parameter which is applied on a per surface 
basis. 
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If the texture is a translucent 16 bit 4444 type, then the value is treated as an 
unsigned 4 bit value (i.e. in the range 0-15) and is added to the pixel alpha 
value, (the total being limited to a maximum value of 15) thus allowing the 
texture to be gradually faded out. 


If the texture is a 16 bit 555, or an 8 bit type, then this behaves as a 4 bit 
"alpha" value, which is applied to the texture globally in a similar fashion to 
the translucent 4444 type texture. 

Masking 

Pixel masking supports 3D rendering into memory which already has image 
data in it (typically 2D scenery). A background plane, covering the entire scene 
area, is defined using a reserved surface code of 1. The ISP processes this as 
normal and any spans which do not contain any other 3D data are passed to the 
TSP with this surface code. The TSP will process this pixel data as flat shaded 
pixels and store them in the accumulation buffer. So long as this data is not 
modified in a subsequent translucent pass, it is tagged as masked and no write 
data is generated to the image buffer. 


Foggin 

Since fog density is a function of depth this parameter is computed in ISP and 
passed to TSP as an 8 bit intensity value. Pixels are fogged prior to storage in 
the accumulation buffer. The fogging operation is simply an interpolation 
between a fog color and the current pixel color, i.e. : 

fogged_pixel_color = intensity*fog_color + (1.0-intensity) * input_pixel_color 


The fogged_pixel_color is computed to 24 bit accuracy. 


3.3.16 Pixel Formatter 
The pixel formatter consists of two main functions, namely the dithering unit 
which is followed by the pixel packer. 
The dithering unit can optionally be enabled. It performs ordered dither on the 
24 bit rendered pixels enabling 16 bit dithered formats for improved quality. 
The dithering function is independent from the formats used. 
The pixel packer performs two functions, one is the formatting and packing of 
the pixels into packets and the second is optional little/big endian mode 
change. 
As a result PCX2 supports a wide variety of standard (undithered) and dithered 
formats all available in little or big endian modes. 
The summary of these formats are: 

Standard 16 bit formats: Dithered 16 bit formats: Standard 24 bit formats: 
RGB 565 packed RGB 565 packed RGB 888 packed 
RGB 555 packed RGB 555 packed RGB 888 unpacked 
BGR 565 packed BGR 565 packed BGR 888 packed 
BGR 555 packed BGR 555 packed BGR 888 unpacked 
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When operating in a packed mode, the start of frame address must align to an 
exact packing boundary (i.e. 2 pixel alignment for 16 bit packed and 4 pixel 
alignment for 24 bit packed). 


3.4 Memory Arbiter and SDRAM Interface 


3.4.1 Overview 


The synchronous DRAM interface in PCX2 is designed to work with most 
common types of SDRAM and SGRAM devices. If SGRAM devices are used, 
then it is intended that the DSF (or special function) pin is tied low (inactive) 
to ensure the device operates as an SDRAM. The interface has programmable 
timing registers which allow for performance to be maximised according to 
the clock frequency used, and for the timing constraints of different devices. 
The interface requires 32 bit data and supports 1, 2 or 4 megabytes. The 
following description assumes some knowledge of the terminology used in 
SDRAM devices. 


The SDRAM mode register is set to 0000110000 (binary) at power-up and is 
never changed. 


3.4.2. Arbitration of Synchronous DRAM 


The synchronous DRAM is a shared resource, providing data for a multitude 
of functions within PCX2. The following functions need to be shared for the 
SDRAM: 


e CPU accesses (read and write) 

e ISP parameter accesses (read and write) 
e TSP parameter accesses (read) 

e Texture pixel accesses (read) 


The accesses are arranged in a fixed priority which has been determined from 
the dynamic behaviour of the system, in order to maximise performance. The 
order used is: 


. ISP parameter writes 

. ISP parameter reads 

. Texture pixel reads (in currently open page) 
TSP parameter fetches 

. CPU writes 

. Texture pixel reads (others) 

. CPU reads 


3.4.3 Setup and Programming 


The registers need to be programmed to meet the constraints of the SDRAM 
devices used. After hard reset, the registers are set to their maximum values in 
order that slow devices go through the correct power-up sequence. The 
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exception to this is the refresh_count register which resets to zero, so the 
devices are continuously being refreshed after power-up. 


All registers except refresh_count are programmed so that 0 represents one 
cycle delay (which is the minimum delay as only one command can be issued 
at each cycle). For example if act0_to_actl is programmed to 0, then the 
SDRAM interface is instructed that an activeO command can be issued on the 
cycle directly after an activel; if the value programmed is 1, then 1 clock cycle 
is required between these two commands. These registers can be determined 
from the relevant SDRAM specification. 


The values entered in all these registers are all in units of one PCX2 clock 
cycle. This is denoted by the symbol “‘t“, a period in nanoseconds (e.g. 15 ns = 
66.6 MHz). Most values are given in nanoseconds and should be converted to 
clock cycles by the following formula : 


number_of_clocks = (round_up_to_nearest_integer(time / T)) - | 


e.g. 
tRAS =70ns 
T = 16ns 
act_to_pre = tRAS/t -1 
= 4.375 -1 
(rounded =5 -l 
=4 (clock cycles) 


3.4.4 Timing 


The SDRAMs must be compatible with the clock to output delays on PCX2, 
and all other electrical timing considerations of the board on which it is to be 
used. There is one parameter which is of particular importance. This is tAC, 
the clock to data out time of the SDRAM data lines. 


Given the current technology used for PCX2, the maximum tAC value is 
generally required to be less than 10 ns. Some common SDRAMs do not meet 
this timing requirement for versions slower than 10 ns 
(‘-10’ parts). A full timing analysis is required, as a matter of course in any 
board design, therefore section 4.2 should be consulted for an accurate 
assessment of which devices are suitable. 


3.5 PCI Interface 


3.5.1 Overview 


The PCI bus interface of PCX2 is an industry standard 66MHz PCI rev 2.1 
compliant interface. The interface has both PCI Slave and PCI Master 
functionality. The following diagram shows the relevant data paths and their 
interaction with the PCI Interface. 
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Figure 25 - PCI Interface Data Flow 


For the transactions requiring PCX2 to be a PCI master, an arbiter is used to 
queue the requests in the correct order of priority. Programming of this arbiter 
is discussed later in this section. The mastership arbiter is separate from the 
PCI bus arbiter contained within the PCI Interface Module. 


The PCI Interface Module is required to allocate memory configuration spaces 


as follows: 

Memory Space 0 : 64KB Control Data (Slave IO) 

Memory Space 1: 4MB_ Texture Params & Data (Slave IO) 
Configuration space parameters of PCX2 are as follows: 

Vendor ID : 1033 H 

Device ID : 0046 H 

Class Code : 048000 H (Other Multimedia Device). 


PCX2 is a medium speed device for DEVSEL#. 
3.5.2 PCI Bus Mastership Arbitration Options 


PCX2 contains an arbiter to determine whether the next PCI bus mastership 
slot allocated to the device should be used for reading of ISP surface 
parameters or for the transferring of rendered data to the graphics controller. 
Once the request for the next access has been determined, it will only be 
changed should the current mastership cycle be disconnected or aborted by the 
PCI bus, even if a request of higher priority subsequently comes along. 


Programming of the ARBMODE register allows the default behaviour of this 
arbiter to be modified as follows. 


Arbitration Overlap 


The default behaviour of the arbiter is to wait until the current bus mastership 
transaction has been completed before arbitrating for the next transaction. The 
maximum burst in a single request is 32 words, and so PCX2 will by default 
relinquish mastership of the PCI bus after a maximum of 32 words transferred, 
regardless of the latency timer value. By programming _ the 
OVERLAP_CONTROL bits of the ARBMODE register, PCX2 can be made 
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to arbitrate immediately following commencement of the current mastership 
cycle (01), or when the current burst has one more item of data to transfer (10), 
thus allowing the PCIREQ# signal to be asserted earlier. 


Arbitration Continue 


For mastership requests to consecutive addresses, PCX2 will by default start a 
new PCI cycle for each burst, thus incurring the overhead of re-arbitrating for 
the PCI bus and performing the address and turnaround cycles. By setting the 
CONTINUE bit in the ARBMODE register, these consecutive bursts will be 
amalgamated into a single PCI transaction, up to the latency counter value 
limit. This mode is only operational when arbitration overlap is enabled. 


ISP Parameters vs. Video Priority 


The reading of ISP parameters should generally take priority over writing of 
video to the graphics controller. By default, these priorities reverse when the 
video output fifo becomes full, as this situation then stalls the whole of the 
processing pipeline. The PRIORITY_CONTROL bits of the ARBMODE 
register allow the threshold of priority switching to be altered. 


3.5.3 PCI Master Aborted Cycles 


When a PCI master cycle is aborted, the address currently being accessed is 
latched, and an interrupt is generated. (Whether this interrupt is seen on 
PCIINTA depends on the INTMASK setup). The latched address is accessed 
by reading the ABORTADDR register. 


3.5.4 Programming for Video Master Writes 


Correct operation of the Video Interface is achieved by programming the 
LSTRIDE and SOFADDR registers with the appropriate values before the 
commencement of render of each frame. 


LSTRIDE should be programmed with the line stride value in bytes. 


LSTRIDE = Pixel width of image destination display 
x Number of bytes per pixel (dependent on packing 
mode). 


SOFADDR should be programmed with the start of frame address. When 
operating in a packed mode, the start of frame address must align to an exact 
packing boundary (1.e. 2 pixel alignment for 16 bit packed, 4 pixel alignment 
for 24 bit packed). 


Horizontally, the left and right edges of the image can be masked by setting the 
appropriate pixel values in the XCLIP register. 


3.5.5 Estimation of PCX2 PCI Bandwidth and Latency Requirements 
Taking into account the parameter caching mechanisms and associated double 


buffering, the latency and bandwidth requirement of PCX2 parameters and 
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output image can be estimated as shown in the following tables. These 
estimates assume 16-bit image formats. 


Bandwidth Requirements 


@ 30FPS @ 30FPS 
100k Parameters: <8MB/s Parameters: <8MB/s 
polygon/s | Image: 5MB/s Image: 18.5MB/s 


Total: <13MB/s Total: <26.5MB/s 
200k Parameters: <l6MB/s Parameters: <l6MB/s 
polygons/s | Image: SMB/s Image: 18.5MB/s 
Total: <21MB/s Total: <34.5MB/s 


Latency Requirements 


@ 30FPS @ 30FPS 
100k ISP Parameters > 2ms ISP Parameters > 400ms 
polygon/s TSP Parameters > 30ms TSP Parameters > 30ms 


Image > 100 us Image > 50 us 
200k ISP Parameters > 2ms ISP Parameters > 200us 
polygons/s TSP Parameters > 30ms TSP Parameters > 30ms 
Image > 50 Ls 


3.6 Built-in Self Test on Internal Memories 


Built in self test is included on the internal memory blocks. Two forms of tests 
are possible, a data test and an address test, controlled by the 
MEMTEST_MODE register. Writing to this register also commences the test. 


Reading the MEMTEST_RES* registers yields the result of the test. The 
MEMTEST_RES| register should be polled until the MEMTEST_FINISHED 
bit is set, at which point any of the result bits being set indicated a memory 
failure. 
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4. Interface Description 


4.1 PCI Signal Description 


Width 
PCICLK Clock for PCI interface 
PCIRST# | in __| PCI reset, used as master reset for PCX2 


PCIAD PCI Address Data bus 
PCICBE PCI Bus Command and Byte Enables 
PCIPAR 


in 
! 
! 
| ot | out PCI Parity 
|PCIFRM# {|_| infout__| PCI Cycle Frame 

pot fin | PCI MasterGrant_ 


PCIREQ# 
PCIGNT# 
PINTA# 


4.2 SDRAM Signal Description and Configuration 


The SDRAM interface is designed to support 1, 2 or 4 megabytes of SDRAM. 
The intended configurations are: 


IMB_ - 1 32 bit data 8Mbit SDRAM/SGRAM device or 
- 2 16 bit data 4Mbit SDRAM/SGRAM devices 


This is the simplest configuration, providing a matrix of 256 words per row by 
1024 rows in 2 banks. If there are two 16 bit devices, they are accessed 
simultaneously and each provides half the data word. 


2MB_- 2 32 bit data 8Mbit SDRAM/SGRAM devices 


This configuration uses the two CS (chip select) lines to active the appropriate 
device depending on the address. It provides a matrix of 512 words per row by 
1024 rows in 2 banks. 


4MB_ -2 16 bit data 16Mbit SDRAM/SGRAM devices 


This configuration uses two 16 bit devices both operating simultaneously, each 
providing half the data word. It provides a matrix of 256 words per row by 
4096 rows in 2 banks. 
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The following table shows an example of how the pins are connected: 


bytes(4/8M B (4/8M parts) 
devices) 
| SDRAS- | CRASS CRASS 


SDCAS }—___CAS {CAS 
SDWE 
SDDQMO Savas au ae 
SDDQMI1 DQMO-3 (for a 2nd device) 
DSF (SGRAM) a eee 
Tied to Vdd 


Multiple DQM lines are provided for electrical drive strength, all four are 
always driven to the same level at the same time. 


Most types of SDRAM/SGRAM devices available at the publication date of 
this document are supported by PCX2. Most devices offer more functionality 
than is required by this interface; some devices cannot be used however, 
because they do not meet some requirements (in particular the timing 
constraints). 


The minimum requirements for suitable SDRAMs are: 


e Meeting the tAC timing constraint for the clock frequency used. 

e Support a CAS latency of 3. 

e Support burst length of 1, for read and write. 

e Must be used in a configuration that gives 32 data bits, with the 2 CS 
signals as described above. 


In particular, the devices used do not have to support: 


Auto precharge 

Power-down 

Self refresh 

Burst read/write lengths greater than 1 word 
Burst stop command 

CAS latencies other than 3 

Special SGRAM functions 
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4.3  PowerVR Miscellaneous Signals 


4.3.1 General Purpose 1/0 signals 


PCX2 contains a 16 bit general purpose I/O port, GP(15:0). Functionality of 
this port is controlled by accessing the GPPORT register of PCX2. The 
GPPORT register bits 31:16 control the direction of GP(15:0) respectively, 
while bits 15:0 control the value which will be asserted on GP(15:0) when the 
respective control bits are set for output mode. Reading bits 15:0 of the 
GPPORT register yields the value of the GP pins at that time. 


4.3.2 Clock Signals 


PCX2 internally contains all the clock buffering necessary to drive the device 
and its associated SDRAMs. The core oscillator should be connected to the 
PVRCLK input pin. This input drives three output buffers, MEMCLKO and 
MEMCLK1 which feed the respective SDRAM banks, and CLKCMPO. 
CLKCMPO should be connected (via a delay chain if necessary) to 
CLKCMPI. The path delay to the MEMCLKO and MEMCLK1 outputs can be 
adjusted using the CLK_SELECT register. 


4.3.3 Test Signals 


The following signals are used during silicon testing, and should be tied to the 
recommended value for normal operation. 


NAME DIRECTION VALUE 
TMI fimput 
[TM2 impute 
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4.4 AC Timing Characteristics 


4.4.1 SDRAM timing 


PCX2 has the following internal clock architecture. 
Programmable 
Delay Chain 


Clock Tree 
Delay 
(Clk to Dout) 


PVRCLK 


CLKCMPO 


DATA/ 
ADDRESS 


MEMCLKO 


> = Input/Output Pad 


Figure 26. PCX2 Clock Architecture 


The various elements have the following timing characteristics: 


fPadDely | ts | 
PyRCLKioCLKCMPO | 230s | 38ns | 2ms fo 
pyecixtomemcyx | | 62ns_| sms | with CLKSELECT=0_| 

ee 


[CLK toDou | osns_| 9.tns | 
[Data setptry | ons | | 
DataHoldtey | sos | 


Then for -12nS SDRAM devices with the following timings: 


datwaddrinputsetuptsysu | ans | | | 
[dataaddrinputhotdtson | uss | [| | 
jAccesstimetye | =| ons, | | 
[Dataouiputholdton | ans_| | 
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50ns has 
| | | | | | | | 


i, er ch ee a ee 2 ee 2 ee a ee a ee 


Ons 


tSDSU 
tSDH 
memelk \__/ a a ee 
t 
CLKtoDout 
cmd a 
tOH 
hi tA 
tAC tOH tOH 
tPS ia tab tOH 
tPH tAC 
data & 4 6 Oz 
clkcmp 


Figure 27. SDRAM Timing Diagram 


A 1.2nS delay (using a 7.5” track) is added between CLKCMPI & CLKCMPO which 
then gives a design with the following margin at 75Mhz (66Mhz operating +10%). 


tSDSU min =3.5nS actual = 4.68nS margin = 1.18nS 
tSDH min=1.5nS actual = 8.65nS margin = 7.15nS 
tPSU min=0.5nS actual = 6.13nS margin = 5.63nS 
tPH min=1.5nS actual = 2.2nS margin = 0.7nS 
VideoLogic & NEC Confidential 43 22 May 1997 


4.4.2 Clock Timing 


CLK ~— 


INPUT | —— 


OUTPUT-A ~ 


OUTPUT-B | 


OUTPUT-C 


Figure 24 - Clock and Input / Output Timings 


rr a ae 
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3. Register Map and Programming Guide 


The following section contains a description of how to programme PCX2 for 
correct PCI operation. PCX2 has two slave memory areas assigned. Memory 
space 0 is for the internal register bus and is 64KB in size. Memory space | is 
for access to the external SDRAM and is 4MB in size. Reference should be 
made to the PCI bus specification for more detail on PCI operation. 


5.1 PCI Setup 


Programming of PCX2 registers occurs via accesses to PCI memory space 0 as 
set during PCI configuration. This memory space is 64KB in size. Register 
access within PCX2 is 32 bit word aligned, byte masking is not supported. 
Addresses given in this document refer to a byte addressed offset which needs 
to be added to the value written to the configuration register to achieve the 
absolute PCI address (32 bit). 


5.2 Address Map 


0000 


[0024 RESERVED SSCS 
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007CC - O1E4 


0800 - OFFC 


5.3 Register Descriptions 


ADDRESS (H) | NAME DESCRIPTION 


006C TMEM_READ ENABLE TEXTURE MEMORY READ BACK ON 
PCI 66MHZ 


0070 CLK_SELECT DELAY SELECTOR FOR CLK TO SDRAM 
0074 FASTFOG FAST RENDERING WHEN FOGGED 


0078 POWERDOWN ENABLE FOR POWERDOWN OF INTERNAL 
RAM 


RESERVED 


O1E8 MEMTEST_DATA | DATA FOR MEMORY TEST 


O1EC 
O1FO 
O1F4 
O1F8 
O1FC 
0200 - 03FC ISP FOG LOOK UP TABLE 
0400 - 0SFC 
Q600-O7FC_|RESERVED | Cd 


ARBMODE ADDRESS 0044H 
bits 1-0 = OVERLAP CONTROL 


bit 2 


bit 3 


bits 5 - 4 


00: No overlap. 
01: Full overlap. 
10: Half overlap. 


CONTINUE ON READ 
0 = Disabled 
1 = Enabled 
CONTINUE ON WRITE 
0 = Disabled 
1 = Enabled 


PRIORITY CONTROL 
00: Video has priority when fifo full. 


01: Video has priority when fifo 3/4 full. 


10: Video always has priority. 
11: Video never has priority. 


ABORTADDR (READ ONLY) ADDRESS 0054H 


bits 31 - 2 
bits 1-0 


Address to which Master cycle caused an abort. 


CYCLE TYPE 
01:Write 
10: Read 


BILINEAR_ MODE ADDRESS 0064H 
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bits 1 - 0 = Bilinear Mode 
OO Full bilinear 
01 Adaptive bilinear only 
10 n/a 
11 Disabled 
bit 2 = SDRAM Caching 
0 Enabled 
1 Disabled 
CAMERA ADDRESS 003CH 
bits 10-0 = camera Z angle 
CLK_SELECT ADDRESS 0070H 
bits 2 -0 = Select the delay on MEMCLK used for sdrams 
000 Delay | from PVRCLK 
001 Delay | from CLKCMPI 
010 Delay 2 from CLKCMPI 
011 Delay 2 from PVRCLK 
100 Delay 3 from PVRCLK 
101 Delay 4 from PVRCLK 
110 Delay 5 from PVRCLK 
111 Delay 1 from PVRCLK 
FASTFOG ADDRESS 0074H 
bit 7 - 0 = Fast fog level 
bit 8 = Fast fog mode 
0 Disabled 
| Enabled 


Fast fog mode improves the efficiency of fogging by comparing the fog level 
of each output pixel from the ISP with the fast fog level, and if greater then 
setting it’s surface code to 2. This allows the TSP to perform a flat shade on 
the pixel span, avoiding unnecessary texture fetches. 


FLOATMODE ADDRESS 0060H 
bit 0 = 4 word to 3 word parameter conversion 
0 Disabled 
1 Enabled 
bit 1 = Edge Scaling 
0 Disabled 
1 Enabled 
bit 2 = Parameter List Linking 
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This register bit enables the concatenation of object parameter fetches if the 
objects are contiguous in memory. 


bit 3 = Shared Edges 
0 Disabled 
1 Enabled 
FOGAMOUNT ADDRESS 0018H 
bits 4-0 = ISP fog amount 
FOGCOLOR ADDRESS 0038H 
bits 23 - 0 = TSP fog color 
bit 24 = TSP fog enable 
GPPORT ADDRESS 0058H 
bits31-16 = Direction Control for GP(15:0) 
QO: Input 
1: Output 
bits 15 - 0 = Data for GP(15:0) 
ID (READ ONLY) ADDRESS 0000H 
bits31-16 = Vendor ID. 
bits 15 - 0 = Device ID. 


N.B. This location returns the same values as the Configuration space 
VENDOR ID : 1033 
DEVICE ID : 0046 


INTMASK ADDRESS 0010H 
bit 0 = End of Render Video. 
bit 1 = End of Render TSP. 
bit 2 = End of render ISP. 
bit 3 = PCI Aborted 


A value of 1 causes the relevant interrupt to activate the INTA pin. 
INTA returns inactive on a read to INTSTATUS. 


INTSTATUS ADDRESS O000CH 
bit O = End of Render Video. 
bit 1 = End of Render TSP. 
bit 2 = End of render ISP. 
bit 3 = PCI Aborted 


NB. _ This register is cleared each time it is read 
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ISP_BASE ADDRESS 0028H 
bits 19-0 = Base address for ISP buffer C in external 
SDRAM. 
LL_DIAG ADDRESS 0068H 
bits 31-0 s Parameter Linked List Diagnostic (Read only) 
LSTRIDE ADDRESS 0048H 
bits 12 - 2 = Width of video line in the frame buffer. 


(N.B. This is a byte value, 32 bit word aligned, so bits | - 0 = 00.) 


MEMTEST_DATA ADDRESS O1E8H 
bits 31 - 0 = Data value, written true then compliment to 


consecutive locations during data memory test. 


MEMTEST_MODE ADDRESS 01ECH 
bits 0 = Memory test mode. 
0: Data Test 


1: Address Test 


(N.B. Memory test is initiated on a write to this register, and takes 6020 
cycles for data and 1024 cycles for address to complete. On completion 
the values 80000000, 00000000, 00000000, 00000000 should be read 
from the result registers if the tests has passed) 


MEMTEST_RES1 ADDRESS 01FOH 
bits 31 = Memory Test Finished 
bits 24 - 20 = Result from ISP TLB 
bits 19 - 15 = Result from TSP Cache 64x10 B. 
bits 14 - 10 = Result from TSP Cache 64x10 A. 
bits 9-5 = Result from TSP Cache 512x32 B. 
bits 4-0 = Result from TSP Cache 512x32 A. 
MEMTEST_RES2 ADDRESS 01F4H 
bits 24 - 20 = Result from ISP Cache B mid 
bits 19 - 15 = Result from ISP Cache B low 
bits 14-10 = Result from ISP Cache A high 
bits 9-5 = Result from ISP Cache A mid 
bits 4-0 = Result from ISP Cache A low 
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MEMTEST_RES3 


bits 24 - 20 = Result from TSP FE Fifo | 

bits 19 - 15 = Result from TSP FE Fifo 2 

bits 14 - 10 = Result from TSP FE Fifo 4 

bits 9-5 = Result from TSP FE Fifo 5 

bits 4-0 = Result from ISP Cache B high 
MEMTEST_RES4 ADDRESS O1FCH 

bits 24 - 20 = Result from PCI Address Fifo 

bits 19 - 15 = Result from PCI Video Data Fifo 

bits 14 - 10 = Result from PCI Slave Data Fifo 

bits 9-5 = Result from ISP Output Fifo 

bits 4-0 = Result from ISP to SDRAM Fifo 
OBJECT_OFFSET ADDRESS 001CH 

bit 0 = 0: Absolute PCI Address used 

1 : Object offset uses TLB 

bits 31 -2 = Base address for ISP pointers. 

(N.B. This is a byte value, 32 bit word aligned, so bits 1 - 0 = 00.) 
PACKMODE ADDRESS 0040H 


bits 1-O0= PACKING CONTROL 
00: 32 bit argb. 
01: 24 bit packed. 
10: 16 bit packed 565. 
10: 16 bit packed 555. 


bit 2 = ENDIAN 0 = Little 
1 = Big 
bit 3 = RBSWAP 0 = Disabled 
1 = Enabled 
bit 4 = DITHER 0 = Disabled 
1 = Enabled 
bit 5 = KVALUE Value of k in rgb 555 mode 
bit 6 = ALPHAMASK 0 = Alpha value in argb mode not 
masked 
1 = Alpha value in argb mode 
masked 
bity = BYTESWAP 0 = Disabled 
1 = Enabled 
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PAGE_CTRL ADDRESS 0020H 
bits 7-0 = PAGE_16M 
ISP table look up 16MB offset 
bits 9-8 = PAGE_SIZE 
00: 4KB 
O01: SKB. 
11: 16KB. 
PREC_BASE ADDRESS 002CH 
bits 19 - 0 = Base address precalc parameters in external 
SDRAM. 
POWERDOWN ADDRESS 0074H 
bit O = Enables power down of internal ram between 
end and start of render 
REVISION (READ ONLY) ADDRESS 0004H 
bits 15-0 = Revision 
SOFADDR ADDRESS 004CH 
bits 32 - 2 = Absolute PCI base address for framebuffer start 
of field. 
(N.B. This is a byte value, 32 bit word aligned, so bits 1 - 0 = 00.) 
SOFTRESET ADDRESS 0008H 
bit O = Pipeline soft reset. 
0: not reset 
1: reset. 
STARTRENDER ADDRESS 0014H 
Writing to this address causes rendering to commence. 
TMEM_ READ ADDRESS 006CH 
bit O = Enable texture sdram readback when PCI is 
operating at 66 MHz 
TMEM_REFRESH ADDRESS 0034H 
bits 7-0 = SDRAM refresh frequency 
bits 10 - 8 = rd_to_wr (value = 4) 
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Refresh frequency is the number of PCX2 clock cycles before a refresh is 
requested within the SDRAM controller. Refresh has high priority, and 
overrides other accesses to the SDRAM, however refreshes are kept pending 
and scheduled with idle times and page breaks on other accesses to minimise 
their impact on performance. 


The formula for the refresh_count value is (round down to the nearest integer): 
refresh_count = Refresh period / (t * number of rows in SDRAM) 
bits 10-8 = rd_to_wr (value = 4) 


The time required to wait if a write directly follows a read. This should be 4 
for most devices, for the CAS latency of 3 cycles. 


TMEM_SETUP ADDRESS 0030H 


bits 3 - 0 = act_to_pre 


This is the time from active to precharge, normally specified as tRAS. This 
value is applied independently to each SDRAM bank (the same value is used 
for both banks). 


bits 7 - 4 = ref_to_act 


The time from refresh to active, or refresh to refresh, normally specified as 
tRC, 


bits 10-8 = act_to_rw 

The time from active to read or write, normally specified as tRCD. 
bits 14-12 = wr_to_pre 

The time from data-in to precharge, normally specified as tDPL. 
bits 18-16 = rd_to_pre 


The time from read to precharge, some SDRAMs require one more cycle than 
others on this parameter, as they treat precharge as a burst stop on the current 
cycle, whilst others apply it to the following cycle. 


bits 22-20 = pre_to_act 
The time from precharge to active commands, normally specified as tRP. 
bits 26-24 = actO_to_actl 


The time from active command on one bank, to active on the other bank. 
Normally specified as tRRD. 


bits 29-28 = memory configuration 
00: 1MB (1x32 bit device) 
Ol: 2 MB (2x32 bit devices) 
10: 4 MB (2x16 bit devices) 
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XCLIP ADDRESS 0050H 


bits 9-0 = left clip value (pixel number). 


bit 12 = left clipping enable. 
bits25-16 = right clip value (pixel number). 
bit 28 = right clipping enable. 
FOG TABLE ADDRESS 0200 - 03FC 
bits 7 - 0 : DATA 


Read Access to the Fog ROM within PCX2 is provided for test purposes only 


ISP TLB ADDRESS 0400 - OSFC 


bits 11 - 0 : DATA 
DIVIDER TABLE ADDRESS 0800 - OFFC 


The Divider LUT within PCX2 is 256 location by 44 bits wide. The LUT is a 
ROM, read access being provided to it for test purposes only. Bits (8:1) of the 
address are the location to read from, bit 0 determines whether the bottom 32 
or upper 12 bits are being accessed. Access to a low address causes the fetch 
from the LUT, the upper 12 bits are stored in a holding register and are 
returned on a read to the upper address. 


For each case, the access is lower word then upper word 
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6. Pinout 
Num 
[retiree 
PORN| SeeSecion a [TO | 
pompy—[___——«dt OT | 
i 
pasta [SiO 
a ee 
paix [i 0 
Sc a 
pomea— [si of 
paces [ «HO 
cn 
pacers [_———=«d a] 
powpses— [id 
SL 
a 
pomp [ 
pombe [id 
ponps [| ——=«dt | 
ponpe [| ——~«d S| 
So 
a CN 
poap7 | _——~«d | 
a a 
A 
a a 
pomp | ——«d 3] 
pomp [ «OL 
powp [+ fof] 
SS a 
a A 
porabes— [=i S| 
pepo [——S—=* OL 36 
poapes | __——«i Os] 
Petabos [0 TJ 
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Signal Description Type} Pin 
Num 
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Num 


ere 
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re 


See Section 4.3.3 
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Digital Power (5 v) 
Pin List: 
3,15,19,25,34,38,50, 
54,103,107,154,158, 
193,207 

VDD (3.3V) Digital Power (3.3 v) 
Pin List : 
8,13,21,27,32,40,45, 
53,66,78,91,104,114 
122,130,135,142,147, 
157,169,177,183,195, 
208,186 

Digital Ground 

Pin List : 
1,2,9,14,20,26,33,39 
44,51,52,61,67,73,79 
84,90,96, 102,105,106 
115,123,131,136,141, 
148,155,156,159,164 
173,178,182,188,194 


VDD (5V) 
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We Electrical Data 


7.1 Maximum Ratings 


Ambient temperature 


0°C to 70°C 
Storage temperature eee 
DC Supply Voltage Eo 


I/O Pin Voltage with respect to Vss 


7.2 DC Specifications 


7.3 AC Specifications 


7.3.1 Clock Timing 


7.3.2 Input / Output Timing 
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8. Mechanical Data 
8.1 Thermal Specifications 


8.2 Mechanical Dimensions 


PowerVR PCX2 is packaged in a 208-pin PQFP. 
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